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Abstract 


The purpose of this research was to build a classification model and to measure the correlation of self- 
efficacy with visual-verbal preferences using data mining methods. This research used the J48 classifier 
and linear projection method as an approach to see patterns of data distribution between self-efficacy 
and visual-verbal preferences. The measurement of the correlation of engineering students’ self-efficacy 
with visual-verbal preferences using the data mining method approach gets the result that self-efficacy 
does not correlate with visual-verbal preferences. However, engineering students’ self-efficacy influences 
the achievement of initial learning outcomes. Visual-verbal preference is more influenced by students’ 
interest in images so it can be concluded that self-efficacy affects the initial results of learning but does 
not have a correlation with visual-verbal preferences. The results of the decision tree provide the results 
that are easily understood and present a correlation between self-efficacy and visual-verbal preferences 
in a visual form. 

Keywords: self-efficacy, visual-verbal preferences, data mining. 


Introduction 


Every student has different levels of self-efficacy because they have different initial 
abilities and learning experiences. Self-efficacy influences people ‘s belief to face failure and 
try harder in achieving success. Success can build a robust belief in the level of confidence. If 
someone achieves success easily, then he/she are easily discouraged by failure because the level 
of confidence requires experience to overcome the problems that occur. The experience gained 
when dealing with problems becomes capital to help improve self-efficacy (Bandura, 1994). 
Self-efficacy refers to the ability of someone who uses prior experience references to solve 
problems (Boswell, 2013). Individuals who have high self-efficacy have high confidence to deal 
with problems, while individuals with low self-efficacy have fears of facing failure (Wu, Tsai, 
& Wang, 2011). Students try to process information and appraise their self-efficacy from ability 
and learning experience. The success of students to overcome problems can increase self- 
efficacy and reduce failure (Schunk, 2003). Self-efficacy 1s formed from experience, common 
experiences, social persuasions, and physiological reactions (Jordan, Amato-henderson, Sorby, 
& Donahue, 2011). Self-efficacy in the engineering field is very important. Bandura (1997) 
explained that self-efficacy determines the action to be chosen, how much effort is made to 
solve the problem, how can they survive in failure and realize the level of self-achievement 
(Bandura, 1997; Marra & Bogue, 2006). Engineering students need quantitative skills to prepare 
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themselves to face problems in engineering courses. Self-efficacy contributes to academic 
performance even though the factors of problem-solving ability and intellectual ability also 
influence learning outcomes (Aleta, 2016). The factors that influence confidence in success 
in engineering students are mastery experiences, vicarious experiences, social persuasions, 
and physiological states. Classroom and curricular practice influence students’ engineering to 
have self-confidence, retention, and success (Hutchison, Follman, & Bodner, 2005). Bandura 
revealed that engineering self-efficacy was measured using the developed self-efficacy scale 
(Bandura, 2006). Carberry et al. (2010) showed three findings to measure self-concept 1n student 
engineering, namely (1) measurement design considers the factors of self-efficacy, motivation, 
anxiety, and outcome expectation; (2) self-efficacy depends on the experience of the engineer 
students; (3) High correlation between self-efficacy with motivation, expertise and outcome 
expectation (Carberry, Hee-Sun, & Ohland, 2010). 

Engineering courses are often associated with things related to configuration, symbols, 
codes, and topology. Engineering students have process information based on their preferences. 
When Engineering students have presented a content consisting of images and text, 1t will try 
to process which content matches their preferences (Kurniawan, Setyosari, Kamdi, & Ulfa, 
2018). If a student who has visual preferences processes information from the content, then 
the first time he is looking for 1s image content. If the content displayed is only in the form of 
text, he will still try to process information even though the content does not match his learning 
preferences (Peterson, 2016; Plass, Chun, Mayer, & Leutner, 1998). Visualization is essential 
in learning. Visualization can simplify the information that is difficult to understand (Sudatha, 
Degeng, & Kamdi, 2018). A student always has different preferences in processing information. 
Also, students also have self-efficacy in dealing with problems in the information processing 
process. Someone who has high self-efficacy tends to be able to make more efforts in processing 
information if the information presented does not match his preferences. Therefore, research is 
needed, which aims to identify the self-efficacy that a person has seen from the visual-verbal 
preferences approach. 


Problem of Research 


Data mining methods aim to determine the classification model by determining data 
classes and grouping examples based on similarity attributes. Previous research measured 
correlation using descriptive statistical methods, such as in the research of measuring cognitive 
style visualizer correlations to the achievement of learning outcomes in design modeling 
and performance (Pektas, 2013). In the research Pektas (2013) used the analysis of variance 
analysis (ANOVA) to determine whether cognitive style has any effect on design modelling and 
performance. However, descriptive statistics method is not optimal for drawing correlations 
in the form of data visualization so that the tendency of one variable/class cannot be seen. 
Therefore, the measurement of correlation of self-efficacy to visual-verbal preferences with data 
mining methods is needed as an alternative method besides descriptive statistical methods. This 
research was tried to build a classification model for experimental data that has been collected. 
This research discusses the analytical method for measuring the correlation of engineering 
students’ self-efficacy if it is associated with visual-verbal preferences. 


Research Focus 
The focus of this research was: (1) building a data mining classification model, and (2) 


measuring the correlation of engineering students’ self-efficacy and visual-verbal preferences 
using data mining methods. 
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Research Methodology 





General Background 


This study used experimental research with data mining methods for educational data. 
Data mining in this research 1s a classification technique assisted by WEKA data mining software 
(Abernethy, 2010; Witten, Frank, Hall, & Pal, 2017) and Orange data mining (DemSar et al., 
2013). Data mining classification in this research uses decision tree-J48 (WEKA) and Linear 
Projection (Orange) classification techniques. WEKA data mining provides various methods for 
classifying (Kabakchiev et al., 2017). This research used Decision Tree-J48 to do classification. 
Decision tree-J48 is the implementation of the C4.5 algorithm in WEKA data mining. The 
C4.5 algorithm has a method for breaking nodes into several nodes based on the similarity 
of attribute data. Linear projections provide an overview of the linearity correlation between 
engineering students’ self-efficacy and visual-verbal preferences displayed in graphical form. 


Sample 


Participants in this research were 250 engineering students, with details of 72 female 
participants and 178 male participants, as shown in Table 1. 


Table 1. Distribution of participant. 


Preferences 


Self-Efficacy 

Verbal Visual Total 
High Self-efficacy 
Female 4 28 oc 
Male 26 47 73 
Low Self-efficacy 
Female 12 28 40 
Male 28 77 105 


Figure 1 shows the preferences and attributes of self-efficacy for each gender, where 
participants who had high self-efficacy consisted of 32 female and 73 male. Participants who 
had low self-efficacy included 40 female and 105 male. Preference has two attributes, namely 
visual and verbal, while self-efficacy has two attributes, namely high and low. The measurement 
of the self-efficacy scale uses the self-efficacy instrument developed by Bandura (2006), where 
students who obtain value on means from the measurement results are grouped into groups 
of students with high self-efficacy, while students who obtain the value under means from the 
measurement results grouped into groups of students with low self-efficacy. 


Instrument and Procedures 


Participants in this research were given several tests aimed at measuring the preferences 
of visual-verbal, self-efficacy, and pre-tests. Test results were processed using data mining 
methods, namely decision trees, and linear projection classifications. Visual-verbal preference 
was measured using a visual-verbal questionnaire (V VQ) developed by Richarson (1977) which 
contained the VVQ category (Richardson, 1977) and Kirby (1988) that developed question 
item of VVQ (Kirby, Moore, & Schofield, 1988). While the self-efficacy measurement used a 
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352| self-efficacy questionnaire (Bandura, 2006). Meanwhile, the analysis of the pre-test results in 
this research has used the questions given in the Cisco Networking Academy Program, Chapter 
1 to measure the initial value of the pre-test (Cisco Systems, 2003). Data mining process data 
classified into five classes, each of which 1s self-efficacy, gender, interest in images, preferences, 
and pre-test results, can be seen in Figure 1. 








2 Gender (@ categorical feature Female, Male 

Interest_in_Imag... {@ categorical feature Interest with Image, No interest with image 
4] Preferences f@ categorical feature Verbal, Visual 
5) Pre-Test oO numeric feature 


Figure 1. Class of data. 


Self-efficacy has two attributes, namely “high” and “low,” gender has two attributes, 
namely “male” and “female,” image interest has two attributes, namely “interested” and “not 
interested,” preference has two attributes, namely “verbal” and “visual. 


Data Analysis 


The classification phase in data mining consists of three stages, namely (1) experimental 
data; (2) modelling; (3) evaluation (DemSar et al., 2013). Experimental data were presented 
data in five classes and their attributes. Experimental testing data use 10-fold cross validation 
because this test 1s effective for limited data (Kabakchiev et al., 2017). The classification 
developed in this research can be seen in Figure 2. 


4 


xD 
0% Data Table 
LD 2 
; ss 
File 7“ xe? 6% Linear Projection 
% Pros 
‘9 oe? 
e* ” er. 7 d «cet Bat 
— Tree Viewer 


Figure 2. Classification model. 
When the classification algorithm functions, data were distributed in two data 


sets consisting of training data and test data. The algorithm runs ten times and produces a 
classification model, as shown in Figure 3. 
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Data Mining Task Implementation - Classification 


experimental data 
preprocessing 


Experimental 
data 


10-fold cross 
validation 


. Self-efficacy (high, low) 

. Gender (male, female) 

° Interest in image 
object (yes, no) 

° Preferences (visual, 
verbal) 

° Pre-test 








modeling evaluation 


Training data 


! 


Data mining 
algorithm 


l 


Classification model 


Test data 


|! 


Result 


Figure 3. Data mining task implementation-classification. 


The model classification in Figure 2 shows the class of data grouped based on the 


similarity of attributes to form a decision 


tree model. Outliner data were processed by the 


linear projection method. This method displays linear projections from classes labelled data. 
The projection in question is to generalize graphical projections and consider the effect of 


projections on geographical objects (Oran 
and attributes by making algorithm models 
decision tree model can be seen in Figure 4. 


Self-efficacy=High »=——-ro—-»>< Preferences=Visua 








No interest in Image No interest in Image 
object object 


ge Data Mining, 2015). Rules classify attributes 
in decision making. The algorithm for building a 


Interest in Image Pre-test =High 
Yes——_> - —yYes—-> 
object 





Pre-test =Low 


Figure 4. Model of a decision tree classifier. 


The decision tree model in this research predicted the possibility of a class formed by the 
attributes possessed by the instance. The class was the status owned by the instance. The Class 
is often referred to as conclusions from data. Attributes were information that a class has. The 


decision tree model can be seen as follows: 


If self-efficacy=’high’ then interest in image object=’no’ 


Else if preferences=’ visual.’ 


If self-efficacy=’high’ and preferences=’ visual ‘then interest in the image object 


Else if pre-test=’high’ 
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Research Results 


The number of instances in this research was 250 data instances. Each instance has a 
class, namely self-efficacy, gender, interest in the object image, and preference. This research 
uses data mining methods with classification techniques to identify classes, attributes, and 


examples. Research data can be seen, as shown in Figure 5. 





1 High Interest with Ima 
? Male No interest with i... 
Male Interest with Ima... 
a Male Interest with Ima... 
is _| Male Interest with Ima... 
A _| Male Interest with Ima... 
Male No interest with i... 
ae | Male Interest with Ima... 
ci Male No interest with i... 
in| Male No interest with i... 
fit | Male Interest with Ima... 
17 | Male No interest with i... 
A Male Interest with Ima... 
14 | Male No interest with i... 
is | Male Interest with Ima... 
1A | Male Interest with Ima... 
Male Interest with Ima... 
‘1a | Female Interest with Ima... 
19 | Female Interest with Ima... 
an_| Male Interest with Ima... 
Male No interest with i... 
Male No interest with i... 
Male Interest with Ima... 
Male No interest with i... 
Female No interest with i... 
Male Interest with Ima... 
Male Interest with Ima... 
Male No interest with i... 
Male No interest with i... 
Female No interest with i... 
Male Interest with Ima... 
Female No interest with i... 
Female Interest with Ima... 
Male No interest with i... 
Female Interest with Ima... 
Female Interest with Ima... 
Female Interest with Ima... 
aR _| Male No interest with i... 
Female No interest with i... 
4n_| Male Interest with Ima... 
41 _| Female Interest with Ima... 
Male Interest with Ima... 
43 | Female Interest with Ima... 
44 | Male No interest with i... 
45 | Male Interest with Ima... 
46 Male No interest with... 


Figure 5. Instance data of the research. 


visual 
visual 
visual 
Visual 
Yerbal 
Yerbal 
Yerbal 
Visual 
Visual 
Visual 
Visual 
Visual 
Yerbal 
Visual 
Visual 
Visual 
Visual 
Yisual 
Visual 
Yerbal 
Visual 
Visual 
Visual 
Visual 
Visual 
Visual 
Visual 
Visual 
Visual 
Yerbal 
Visual 
Visual 
visual 
visual 
visual 
visual 
¥erbal 
Yerbal 
visual 
Visual 
¥erbal 
Visual 
¥erbal 
¥erbal 
visual 


Self_Efficacy est_in_Image_0 
Male .. Visual iM 


ETT TE TT TTP TTE TET 


Figure 5 shows an example of data in each class with each pre-test value. The Class has 
105 high-value attributes, while the class that had a “low” attribute has an instance of 145. The 


distribution of self-efficacy can be seen in Figure 6. 
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Name: Self_Efficacy Type: Nominal 
Missing: 0 (0%) Distinct 2 Unique: 0 (0%) 
No. Label | Count Weight 
1 High 105 105.0 
2 Low 145 145.0 
| Class: Self_Efficacy (Nom) in| Visualize All | 
red 


RSs . High self-efficacy 


: Low selfefficacy 


Figure 6. Class of self-efficacy. 


The Class “gender” had a distribution of ‘male’ attributes consisting of 178 instances 
and ‘female’ consisting of 72 instances. The distribution of the class “gender” can be seen in 
Figure 7. 














Name: Gender Type: Nominal 
Missing: 0 (0%) Distinct: 2 Unique: 0 (0%) 
No. | Label | Count | Weight 
1 Male 178 178.0 
2 Female #2 72.0 
Class: Self_Efficacy (Nom) || Visualize All 








QQ ish set etficagy 


: Low self-efficacy 


Figure 7. Class of gender. 
Class “interest in the image” had a distribution of attributes ‘yes’ consisting of 145 


instances and ‘no’ consisting of 105 instances. The distribution of the class “interest in the 
image” can be seen in Figure 8. 
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Name: Interest_in_Image_Object Type: Nominal 
Missing: 0 (0%) Distinct 2 Unique: 0 (0%) 
No. | Label | Count | Weight 

1 Interest with Image 145 145.0 
2 Nointerestwithimage 105 105.0 














Class: Self_Efficacy (Nom) Y)| Visualize All 





MOQ : High self-efficacy 


: Low self-efficacy 





Figure 8. Class of interest in the image. 


Figure 9 shows that class “preferences” had a distribution of ‘visual’ attributes as many 
as 180 instances, while ‘verbal’ attributes have as many as 70 instances. 

















Name: Preferences Type: Nominal 
Missing: 0 (0%) Distinct: 2 Unique: 0 (0%) 
No. | Label | Count | Weight 
1 Visual 180 180.0 
2 Verbal 70 70.0 
Class: Self_Efficacy (Nom) | Visualize All 





180 





EE 


RQ : High self-efficacy 


: Low self-efficacy 


Figure 9. Class of preferences. 
Participants involved in this research then followed the pre-test to test the initial ability 


level. The minimum value obtained by participants was 60, and the maximum value obtained is 
95, where the mean value is 76.288, and the standard deviation is 8.184, as shown in Figure 10. 
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Name: Pre-Test Type: Numeric 

Missing: 0 (0%) Distinct: 9 Unique: 0 (0%) 
Statistic | ¥alue 

| Minirnurn 60 
Maximum 95 
Mean 76.288 
StdDey 8.184 

Class: Self_Efficacy (Nom) v Visualize All 








RQ : High self-efficacy 
[rs] 


: Low self-efficacy 


Figure 10. Means of pre-test. 
J48 Classifier Analysis 


The J48 classifier is a data mining method that implements a C4.5 algorithm to build 
a decision tree model. The decision tree model is created to form a classification model 
(Bhuvaneswari, Prabaharan, & Subramanityaswamy, 2015). The level of accuracy obtained in 
this research is 66%, and the mean absolute error (MAE) is 0.3855. MAE serves to measure the 
accuracy of predictions by averaging errors (the absolute value of errors). The analysis process 
in this section uses WEKA data mining to form a classification model. The classification 
process in WEKA data mining produces a confusion matrix. The confusion matrix is a method 
for measuring classification performance. The classification system performance describes how 
well the system classifies data. The confusion matrix can see the results of 2 lines. The first 
line, “41 64” shows that there are (41 + 64) instances class self-efficacy ‘high‘ and all right are 
classified as self-efficacy ‘high.* In the second line, “21 124” shows that there are (21 + 124) 
instances class self-efficacy ‘low‘ and all are classified as self-efficacy ‘low, as in Figure 11. 
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J48 pruned tree 


Pre-Test <= 85 

| Interest_in Image Object = Interest with Image 
| |  Pre-Test <= 80: Low (108.0/47.0) 

| | Pre-Test > 80: High (16.0/3.0) 


| Interest_in Image Object = No interest with image: Low (101.0/21.0) 


Pre-Test > 85: High (25.0/1.0) 


Number of Leaves : 4 


Size of the tree : 7 


Time taken to build model: 0.07 seconds 


=== $tratified cross-validation === 
=== Summary === 
Correctly Classified Instances 165 
Incorrectly Classified Instances 85 
Kappa statistic 0.2604 
Mean absolute error 0.3855 
Root méan squared error 0.4489 
Relative absolute error 79,1021 % 
Root relative squared error 90.9317 4% 
Total Number of Instances 250 
=== Detailed Accuracy By Class === 
TP Rate FP Rate Precision Recall fF-Measure MCC 
0.390 0,145 0.661 0.390 0.491 0.281 
0.655 0.610 0.660 0.655 0.745 0.281 
Weighted Avg. 0,660 0.414 0.660 0.660 0.638 0.281 


=== Confusion Matrix === 


a Hh <4«-- classified as 
41 64 | a = High 
21 124 | b = Low 


Figure 11. Confusion matrix. 
Data visualization Analysis 


The research used data visualization with a decision tree 


66 % 

34 $ 
ROC Area PRC drea Class 
0.706 0.663 High 
0.706 0.727 Low 
0.706 0.700 


and a linear projection approach 


to obtain a classification model. Linear projection is a machine learning method that refers to 


the number of populations from an instance. The linear projecti 


on method presents information 


about statistical correlations, about the linearity of specific aggregate values. Learning 
preferences of students have a relationship with student interest in images, while the level of 
self-efficacy does not correlate with the visual-verbal preferences, as shown in Figure 12. 
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Figure 12. Linear projection of data visualization obtained for the data class. 


Figure 12 shows that male students have more interest in images when compared to 
female students. The result is reinforced by the results that show that male students who have 
visual preferences have more numbers compared to female students. Grouping in the decision 
tree is divided based on the value of achievement on the results of the pre-test, consisting of two 
groups, namely students with a value of> 85 and students with a value of < 85. Students who 
have pre-test> 85 have high self-efficacy. Whereas in students who have the results of pre-test < 
85 are distinguished based on the students’ interest in the image, as shown in Figure 13. 


Pre-Test 


Pa ee 


«= 85 > 85 


Interest_in_Image_Object High (25.0/1.0) 


= Interest with Image = No interest with image 


Pre-Test Low (101.0/21.0) 


ee aes 


s= 60 > 80 


Low (108.0/47.0) High (16.0/3.0} 


Figure 13. The Decision tree of correlation learning outcomes with the results of 
the pre-test and self-efficacy. 
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Figure 13 shows that interest in images can affect the results of pre-test (> 80) if students 
have high self-efficacy. So, it can be concluded that high self-efficacy directly affects the 
achievement of the pre-test results, on the other hand, the interest in images and visual-verbal 
preferences do not influence the achievement of the results of the pre-test. Students who have 
visual preferences with interest in images can get good pre-test results if there is a high self- 
efficacy factor. 


Discussion 


This research showed that participants who have a low level of self-efficacy are 58%, and 
participants who have a high level of self-efficacy are 42%. However, the results of the research 
showed that there is a correlation between the level of efficacy of visual-verbal preferences, as 
indicated by the variation of the results obtained, as shown in Figure 14. 


7 





Visual Preferences 


fi Low Self-efficacy 
E) High Self-efficacy 


Verbal Preferences 








0 20 40 60 80 100 = 120 


Figure 14, Overall accuracy of classifiers. 


The decision tree method can present a correlation and classification model based on 
the similarity of attributes of data instances. It is consistent with previous research that shows 
that this method can help establish a classification model for experimental data (Kabakchiev 
et al., 2017). The classification model classifies data instances with their respective attributes 
(DemSar et al., 2013; Singh, Naveen, & Samota, 2013). The decision tree method in this study 
can predict groups in each example where each sample is grouped according to the associated 
agreement (results see Figure 13). It agrees with Apte and Weiss (1997), which state that 
classifications in data mining are used to predict problems and group problems based on the 
objectives to be used (Apte & Weiss, 1997). Decision tree involves the use of training sets to 
build problem prediction models and classify input data (Singh et al., 2013). The classification 
model was developed based on predictive algorithms by classifying populations into branches 
consisting of root nodes, internal nodes, and leaf nodes (Yan-yan Song & Ying Lu, 2015). 

An additional alternative method that can be used is the linear projection method. Linear 
projection can display statistical projection information and describe the tendency of one class to 
another class. The implementation of data mining methods, such as decision tree-classification, 
and linear projection, are beneficial for measuring correlations between classes and attributes 
of experimental data. 

This research found that self-efficacy affected the results of the pre-test which agreed 
with what was revealed by Abosede and Adesanya, who revealed that self-efficacy influences 
performance and problem-solving abilities (Abosede & Adesanya, 2017). Other studies 
also found results that engineering self-efficacy had a significant correlation with academic 
achievement (Aleta, 2016). 
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Conclusions 


Classification techniques in data mining methods designed to classify data instances aim 
to build a classification model for experimental data. Classification forms a decision tree to 
group data instances based on attribute attributes. The class results of the class form the model 
according to the prediction of the problem presented. This research involved engineering self- 
efficacy towards visual-verbal preferences about the results of the pre-test. There 1s a need for 
analysis of experimental results such as self-efficacy towards the tendency toward visual-verbal. 
This research proposes an alternative method to measure the correlation of self-efficacy with 
a visual-verbal preferences approach called the J48 classifier technique and linear projection. 

J48 classifier is an algorithm used to construct a decision tree with a statistical classifier, 
while a linear projection presents information about the correlation of linearity to several 
measured variables. The measurement results show that self-efficacy correlates with the results 
of visual-verbal preferences. The proposed method can also be applied to the measurement of 
correlation with more classes and large data instances in the database. This research found that 
data mining methods, especially decision trees, were able to be used in analysing correlations 
between several variables. The decision tree method can be used as an alternative method 
besides the statistical method in measuring the correlation between variables. Further research 
is expected to develop other data mining methods specifically for processing educational data. 
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